[WIP][RFC] Add sparse host buffer source #16252
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related to #15919
DISCLAIMER: This is meant to be a rough RFC to illustrate my idea for a temporary mitigation strategy for
NativeFile
removal. I am probably not the right person to take the libcudf changes over the line but I still threw some C++ code together for fun :)Idea: We don't need
NativeFile
support to get reasonable partial-IO performance from remote storage if we only transfer the necessary byte ranges into host memory with fsspec and libcudf is able to read from the sparse<offset, byte-range>
mapping. The fsspec component is covered in #16166, but that PR currently wastes a lot of host-memory by copying the necessary byte ranges into a larger "proxy" byte range that matches the size of the actual file.@vuule @vyasr - Does a "sparse" host buffer source seem like a reasonable approach for the near term?